Skip to content

methodology(streaming): PROVEN (measured, not guessed) MEOS-parity harness + Nebula adapter#44

Open
estebanzimanyi wants to merge 66 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/nebula-streaming-parity-harness
Open

methodology(streaming): PROVEN (measured, not guessed) MEOS-parity harness + Nebula adapter#44
estebanzimanyi wants to merge 66 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/nebula-streaming-parity-harness

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Summary

The streaming-platform sibling of MobilityDB's cross-type parity methodology (doc/methodology/cross_type_parity.md, #1002) and audit harness (tools/parity_audit/, #1110). It answers, per streaming platform (NebulaStream · Flink · Kafka): how many of the streamable MEOS functions does it actually support — proven by a passing test?

Why

Streaming parity was reported as a guessed number — "2,097/2,097 wirable", "27/27 cells full" — measuring wirability or registration, not that anything ran. When the operators were finally executed, the systests had never run at all. Wirable ≠ wired ≠ working.

Three-layer backing gate (only L3 counts)

Layer Meaning Introspection
L1 EXPORTED symbol in pinned libmeos nm -D
L2 WIRED an operator/UDF calls it Nebula meos_call + SQL token; Flink/Kafka facade method
L3 PROVEN a test exercising it passes Nebula systest; Flink/Kafka JUnit

Instrument — accumulated-PR builds

Measure over a build with all open PRs merged (the shortfall is un-accumulated PRs, not unimplemented work), not stale master.

Reference surface + reason-marked exclusions

Streamable MEOS public API = 1,949 fns (stateless/bounded-state/windowed/cross-stream, from the v4 classifier over meos-idl.json). io-meta/sequence-only/ambiguous/internal are reason-marked non-streamable — never gaps (the streaming sibling of the DB methodology's semantic/structural exclusions).

Measured NebulaStream baseline (accumulated build, all 28 systests green)

layer count % of 1,949
L3 PROVEN 8 0.4%
L2 wired-only 77 4.0%
gap 1,864 95.6%

The honest replacement for the former "100% wirable" headline.

Next

Flink/Kafka adapters (javap/reflection × JUnit), per-operator test generation to turn L2→L3 in bulk, and a CI parity gate that makes a false 100% impossible by construction.

… on NebulaStream (33 YAMLs, 27/27 cells)

Additive scaffold for the BerlinMOD-9 × 3 streaming-form parity contract
on MobilityNebula, sibling to the existing SNCB Q-series and matching
the MobilityFlink MobilityDB#3 / MobilityKafka MobilityDB#1 streaming-form definitions.

All 27 cells covered:

  Q1 'which vehicles have appeared'      — full (continuous + windowed + snapshot)
  Q2 'where is vehicle X at time T'      — full
  Q3 'vehicles within 5 km of P'         — full
  Q4 'vehicles inside region R (polygon)'— full
  Q5 'pairs of vehicles meeting near P'  — partial (emit per-vehicle trajectories near P; consumer joins)
  Q6 'cumulative distance per vehicle'   — partial (emit TEMPORAL_SEQUENCE; consumer computes length)
  Q7 'first passage of vehicle through POI' × {POI1, POI2, POI3}
                                          — full (per-POI fan-out)
  Q8 'vehicles within d of LINESTRING'   — full (edwithin_tgeo_geo with LINESTRING geometry)
  Q9 'distance between X and Y at time T'— partial (emit X and Y trajectories; consumer joins)

18 of 27 cells are FULL (the BerlinMOD-Q semantic is computed entirely
inside NebulaStream). 9 cells are PARTIAL — NebulaStream emits the
per-window inputs (trajectory, candidate vehicles) and a consumer
post-processes for the final BerlinMOD-Q answer. The partial pattern
is the natural expression of these queries in NebulaStream's current
SQL surface; the path to FULL is documented per-Q in
docs/berlinmod-streaming-forms.md (a stream-self-join for Q5/Q9, a
temporal_length scalar function for Q6).

Form mapping to NebulaStream windows:

  continuous: SLIDING(time_utc, SIZE 1 SEC, ADVANCE BY 1 SEC)
  windowed:   TUMBLING(time_utc, SIZE 10 SEC)
  snapshot:   TUMBLING(time_utc, SIZE 5 SEC)

MEOS-side surface consumed (already exposed by PR MobilityDB#14 + follow-ups):

  edwithin_tgeo_geo — Q3 (POINT predicate), Q4 (POLYGON, d=0.0),
                      Q5 (POINT predicate), Q7 (per-POI POINT),
                      Q8 (LINESTRING predicate)
  TEMPORAL_SEQUENCE — Q2 / Q5 / Q6 / Q9 (per-window per-vehicle trajectory)

No new MEOS PhysicalFunction classes added; no C++ changes; no SNCB
Q-series modifications. All 33 YAMLs are additive in a new
Queries/berlinmod/ subdirectory.

Add (additions):
  Queries/berlinmod/q1_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q2_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q3_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q4_{continuous,windowed,snapshot}.yaml          (3)
  Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml          (3, partial)
  Queries/berlinmod/q7_poi{1,2,3}_{continuous,windowed,snapshot}.yaml (9, full via fan-out)
  Queries/berlinmod/q8_{continuous,windowed,snapshot}.yaml          (3, LINESTRING predicate)
  Queries/berlinmod/q9_{continuous,windowed,snapshot}.yaml          (3, partial)
  Input/input_berlinmod.csv  (sample data: 3 vehicles × 21 events, 14 simulated seconds)
  docs/berlinmod-streaming-forms.md

Validation: every YAML parses cleanly via python3 yaml.safe_load.
Runtime verification gated on the NebulaStream test harness.

Coverage: 27 of 27 cells (100 %), with 18 FULL and 9 PARTIAL annotated
explicitly per Q. Path to FULL for the 9 PARTIAL cells is one
MobilityNebula C++ PhysicalFunction class each (or a NebulaStream
upstream stream-self-join), documented in
docs/berlinmod-streaming-forms.md.
…-form cells to full

Adds the TEMPORAL_LENGTH aggregation across the four levels of the
NebulaStream pipeline (logical / physical / parser / lowering) so the
BerlinMOD-Q6 "cumulative distance per vehicle" streaming-form cells
(continuous + windowed + snapshot) compute the spheroidal trajectory
length entirely inside NebulaStream instead of emitting raw trajectories
for a consumer-side reduction.

Logical: nes-logical-operators/{include,src}/Operators/Windows/Aggregations/Meos/TemporalLengthAggregationLogicalFunction.{hpp,cpp}
mirroring TemporalSequenceAggregationLogicalFunctionV2 but with finalAggregateStampType = FLOAT64.
Registers as "TemporalLength" in the aggregation registry. Serializes through the existing
TemporalAggregationSerde wire shape with the type tag overridden.

Physical: nes-physical-operators/{include,src}/Aggregation/Function/Meos/TemporalLengthAggregationPhysicalFunction.{hpp,cpp}
identical lift / combine / reset / cleanup to TemporalSequenceAggregationPhysicalFunction;
the lower() path builds the same MEOS instant-set trajectory string, parses it via
MEOSWrapper::parseTemporalPoint, and calls MEOS' tpoint_length(Temporal*) to return a single
FLOAT64 result.

Parser: nes-sql-parser/AntlrSQL.g4 adds the TEMPORAL_LENGTH lexer token and includes it in
functionName. AntlrSQLQueryPlanCreator.cpp adds the TEMPORAL_LENGTH dispatch in both the
case-label and string-name paths, parallel to TEMPORAL_SEQUENCE.

Lowering: nes-query-optimizer/src/RewriteRules/LowerToPhysical/LowerToPhysicalWindowedAggregation.cpp
adds the TEMPORAL_LENGTH special-case lowering, parallel to TEMPORAL_SEQUENCE, producing a
TemporalLengthAggregationPhysicalFunction with the same (lon, lat, timestamp) state schema.

YAMLs: Queries/berlinmod/q6_{continuous,windowed,snapshot}.yaml updated to call
TEMPORAL_LENGTH directly; the FLOAT64 output column replaces the VARSIZED trajectory output;
header comments updated to "FULL".

Docs: docs/berlinmod-streaming-forms.md updated to reflect 21 cells full + 6 cells partial
(Q5 + Q9 only); the path-to-full table now lists those two queries only.

YAML safe_load green on all 3 Q6 cells. Build verification gated on the user's NebulaStream
test harness (vcpkg-bootstrapped); the C++ code follows the established TemporalSequence
template exactly, with the lower() path replaced by tpoint_length.
…streaming-form cells to full

Mirrors the TEMPORAL_LENGTH pattern from the parent PR with two new
four-field aggregations that close the last 6 partial cells on the
MobilityNebula BerlinMOD parity matrix:

PAIR_MEETING(lon, lat, ts, vehicle_id) -> VARSIZED
  Lift collects per-event tuples. Lower picks each vehicle's latest known
  position in the window, enumerates pairs (a < b), calls MEOS' geog_dwithin
  with dMeet = 200 m hardcoded for the BerlinMOD scaffold, and emits a
  string-encoded list of meeting pairs (vid_a, vid_b, ts, "<=dMeet" tag).
  Future PR can parameterize dMeet via a constant input. Closes Q5 × 3 cells.

CROSS_DISTANCE(lon, lat, ts, vehicle_id) -> FLOAT64
  Same lift shape. Lower picks the latest known position of each of the two
  target vehicles (VID_A = 100, VID_B = 200 hardcoded), drives the MEOS
  nad_tgeo_tgeo distance, and returns a FLOAT64 (NaN if either vehicle is
  unobserved). Future PR can parameterize (VID_A, VID_B). Closes Q9 × 3 cells.

Wired across the four pipeline layers identically to TEMPORAL_LENGTH:
  - nes-physical-operators/{include,src}/Aggregation/Function/Meos/{PairMeeting,CrossDistance}AggregationPhysicalFunction.{hpp,cpp}
  - nes-logical-operators/{include,src}/Operators/Windows/Aggregations/Meos/{PairMeeting,CrossDistance}AggregationLogicalFunction.{hpp,cpp}
  - nes-physical-operators/src/Aggregation/Function/Meos/CMakeLists.txt + nes-logical-operators/src/Operators/Windows/Aggregations/Meos/CMakeLists.txt plugin entries
  - nes-sql-parser/AntlrSQL.g4 lexer + functionName tokens
  - nes-sql-parser/src/AntlrSQLQueryPlanCreator.cpp case-label + string-name dispatch
  - nes-query-optimizer/src/RewriteRules/LowerToPhysical/LowerToPhysicalWindowedAggregation.cpp special-case lowering with 4-field state schema

YAMLs: Queries/berlinmod/q5_{continuous,windowed,snapshot}.yaml and
q9_{continuous,windowed,snapshot}.yaml rewritten to call the new
aggregations directly; sink schemas updated to FLOAT64 / VARSIZED;
header comments updated to FULL.

Docs: docs/berlinmod-streaming-forms.md updated to reflect 27/27 cells
full (was 21 full + 6 partial); MEOS-operators table now lists
PAIR_MEETING and CROSS_DISTANCE alongside the existing ones.

YAML safe_load green on all 6 rewritten Q5/Q9 cells. C++ follows the
established TemporalLength template from the parent MobilityDB#16; build
verification gated on the user's NebulaStream test harness.
… covered' section

After PR MobilityDB#16 (TEMPORAL_LENGTH closes Q6) and PR MobilityDB#17 (PAIR_MEETING +
CROSS_DISTANCE close Q5 + Q9), the parity matrix is 27/27 full —
the doc's own coverage table at the top confirms it. But the
section 'Not covered (15 cells / 5 queries)' at line 77 was a
remnant from the pre-MobilityDB#16/MobilityDB#17 state and contradicts the rest of the
doc. Remove it.

Add a new 'Streaming-semantics tier overlay' section that classifies
each BerlinMOD-Q by its streaming-execution tier (stateless /
bounded-state / windowed / cross-stream) per the closed 7-value
vocabulary proposed for the MEOS-API objectModel.streamingSemantics
facet (see the sibling RFC on MEOS-API PR MobilityDB#10). The mapping makes
the cross-binding picture explicit: a Q's tier on NebulaStream is
the same tier on Flink / Kafka, and the table points to the
equivalent generic wiring class on Flink for each tier.

Two short follow-up notes explain why cross-stream looks different
on NebulaStream (single-aggregation Cartesian enumeration vs Flink's
interval-join across two streams — same semantic, different
topology) and why Q7 is bounded-state rather than windowed (per-POI
fan-out, per-(vehicle, POI) bounded state, no full-sequence
reduction needed).

Refresh the 'Sibling parity references' section to point at the
current state of the Flink and Kafka work — Flink's per-tier wiring
infrastructure under org.mobilitydb.flink.meos.wirings (5 generic
classes covering 100% of the streamable surface) and Kafka's codegen
mirror under org.mobilitydb.kafka.meos. Drops stale PR-number
references per the same as-is / no-internal-process discipline
applied elsewhere in the ecosystem docs.

Stacks on PR MobilityDB#17. Docs-only; touches no YAML, no C++ pipeline-layer
file.
The PAIR_MEETING aggregation (added in MobilityDB#17) hardcoded the meeting-distance
threshold at 200 m via a static constexpr DMEET_METRES, with the PR body
noting parameterization as future work. This PR lands that future work:
PAIR_MEETING now takes a fifth argument — a numeric constant in metres —
and the physical operator uses it per-query.

## Surface

  PAIR_MEETING(lon, lat, ts, vehicle_id, dMeet)
                                          ^^^^^ new fifth arg (numeric constant, metres)

The first four args remain FieldAccess (lon, lat, ts, vehicle_id); the
fifth is pulled from the parser's constantBuilder as a numeric literal,
parsed via std::stod, and threaded through the logical→physical lowering
chain into the lower() lambda alongside the existing state pointers.

## Files (9, all stacked on MobilityDB#18MobilityDB#17MobilityDB#16MobilityDB#15)

| Layer | File |
|---|---|
| Physical .hpp | PairMeetingAggregationPhysicalFunction.hpp — `DMEET_METRES` constexpr → `DEFAULT_DMEET_METRES` + instance field `dMeetMetres` |
| Physical .cpp | PairMeetingAggregationPhysicalFunction.cpp — constructor takes dMeet; lower() passes it to the captureless lambda via `nautilus::val<double>` |
| Logical .hpp  | PairMeetingAggregationLogicalFunction.hpp — constructor + create() factory take dMeet; getter `getDMeetMetres()` |
| Logical .cpp  | PairMeetingAggregationLogicalFunction.cpp — initialize field; Registrar deserialize path uses DEFAULT_DMEET_METRES (see Serde caveat below) |
| Parser        | AntlrSQLQueryPlanCreator.cpp — both PAIR_MEETING dispatch sites (lexer-token case + funcName string-name case) extract the constant from constantBuilder, std::stod it, pass to create() |
| Lowering      | LowerToPhysicalWindowedAggregation.cpp — pmDescriptor->getDMeetMetres() flows to the physical constructor |
| YAMLs (×3)    | Queries/berlinmod/q5_continuous.yaml, q5_snapshot.yaml, q5_windowed.yaml — add `, 200.0` as the explicit fifth arg; comments updated to reflect the parameterization |

## Serde round-trip caveat (out of scope for this PR)

`AggregationLogicalFunctionRegistryArguments` is strongly typed to
`vector<FieldAccessLogicalFunction>` — there is no slot for a numeric
constant in the existing Registrar interface, and
`SerializableAggregationFunction` has no proto field for it either. As
a result:

- The parser path (live query execution) is FULLY parameterized — dMeet
  flows from SQL to physical correctly.
- The Serde deserialize path falls back to DEFAULT_DMEET_METRES
  (preserves the 200 m scaffold behaviour). Round-trip fidelity for the
  dMeet value requires (a) adding a new field to
  SerializableAggregationFunction.proto, (b) extending
  AggregationLogicalFunctionRegistryArguments to carry it, and (c)
  threading both through Serialize/Register. That's an infrastructure
  change touching every registered aggregation; tracked as a follow-up.

## Build / test verification

Cannot compile-verify locally — NebulaStream needs the full C++23 +
vcpkg toolchain. Submitted for maintainer build verification (cc
@marianaGarcez). Expected to compile cleanly; the only construction-time
behaviour change is the constructor signature (5 params → 6 params for
physical, 5 → 6 for logical create/ctor); the only runtime behaviour
change is that dMeet is now read from the instance field instead of the
class constexpr (the lambda receives it via the nautilus::val<double>
extra arg).

## Mirrors the CROSS_DISTANCE shape

CROSS_DISTANCE (also added by MobilityDB#17, hardcoded VID_A=100, VID_B=200) has
the exact same parameterization pattern; a sibling PR can apply the
same change with (lon, lat, ts, vid, vid_a, vid_b) — 6 args total
instead of 5. Holding for separate PR.
… args

Sibling to PAIR_MEETING.dMeet parameterization (PR MobilityDB#19) — applies the
same 4-layer pattern to CROSS_DISTANCE. The aggregation (added in MobilityDB#17)
hardcoded the target vehicle pair at (100, 200) via static constexpr
VID_A / VID_B, with the PR body noting parameterization as future work.
This PR lands that future work: CROSS_DISTANCE now takes two unsigned-
integer constants as its fifth and sixth arguments, and the physical
operator uses them per-query.

## Surface

  CROSS_DISTANCE(lon, lat, ts, vehicle_id, vidA, vidB)
                                           ^^^^  ^^^^ new constants (uint64)

The first four args remain FieldAccess; vidA and vidB are pulled from
the parser's constantBuilder (two unsigned-integer literals), std::stoull
them, and threaded through the logical→physical lowering chain into the
lower() lambda alongside the existing state pointer.

## Files (9, same shape as PR MobilityDB#19's PAIR_MEETING change)

| Layer | File |
|---|---|
| Physical .hpp | CrossDistanceAggregationPhysicalFunction.hpp — `VID_A/B` constexpr → `DEFAULT_VID_A/B` + instance fields `vidA/B` |
| Physical .cpp | CrossDistanceAggregationPhysicalFunction.cpp — constructor takes both; lift-time lambda gets them via `nautilus::val<uint64_t>` |
| Logical .hpp  | CrossDistanceAggregationLogicalFunction.hpp — constructor + create() factory + getters |
| Logical .cpp  | CrossDistanceAggregationLogicalFunction.cpp — initialize fields; Registrar deserialize falls back to defaults |
| Parser        | AntlrSQLQueryPlanCreator.cpp — both CROSS_DISTANCE dispatch sites extract two constants, std::stoull both, pass to create() |
| Lowering      | LowerToPhysicalWindowedAggregation.cpp — cdDescriptor->getVidA()/getVidB() flow to physical constructor |
| YAMLs (×3)    | Queries/berlinmod/q9_continuous.yaml, q9_snapshot.yaml, q9_windowed.yaml — add `, 100, 200` as explicit constants; comments updated |

## Serde round-trip caveat (same as PR MobilityDB#19)

`AggregationLogicalFunctionRegistryArguments` is strongly typed to
`vector<FieldAccessLogicalFunction>` — no slot for integer constants.
`SerializableAggregationFunction.proto` has no field for them. So:

- Parser path (live query execution) is FULLY parameterized.
- Serde deserialize path falls back to `DEFAULT_VID_A` / `DEFAULT_VID_B`
  (preserves the 100, 200 scaffold defaults).

Same infrastructure follow-up would close both round-trip gaps at once
(PAIR_MEETING.dMeet and CROSS_DISTANCE.vidA/vidB).

## Build / test verification

Same as PR MobilityDB#19 — submitted for maintainer build verification
(@marianaGarcez). Constants now flow through std::stoull instead of
std::stod; lambda gets two nautilus::val<uint64_t> args instead of one
nautilus::val<double>. Pattern is structurally identical.
…codegen path

Closes the Nebula structural parity gap with Flink/Kafka by shipping
the codegen infrastructure for generating per-MEOS-function pipeline
tuples (logical + physical + parser + lowering). No generated C++
committed in this PR — the maintainer (cc @marianaGarcez) runs the
generator on a chosen MEOS-function batch, reviews output, ships
operators in follow-up PRs at a controlled pace.

Why no generated code in this PR:
- Generator author cannot build NebulaStream (full C++23 + vcpkg
  toolchain not available in author's environment); shipping
  unverified generated code would risk batched-broken operators.
- Per-function review value: maintainer iterates on templates with
  the first batch's build feedback before scaling up.
- Template iteration cost: first-pass templates may need adjustment
  after first build; smaller blast radius if only the generator
  lands.

What lands:
- tools/codegen/codegen_nebula.py — Python generator with embedded
  C++ templates derived 1:1 from the hand-written
  TemporalEDWithinGeometry operator shape (logical/physical/.hpp/.cpp)
- tools/codegen/codegen_input.example.json — first-wave input list
  (5 spatial-relation E/A predicates: EDisjoint, ATouches, ECovers,
  ACrosses, EOverlaps over tgeo_geo)
- tools/codegen/README.md — full design proposal: why codegen, what
  the generator produces, recommended scaling-wave sequence (W1-W5),
  what the generator does NOT do (CMakeLists / parser / grammar
  remain manual paste for idempotence), compile-verification note

Smoke-verified: the generator runs locally + emits 5 operators × 4
files = 20 well-formed C++ source files; templates produce
syntactically-reasonable output matching the existing operator style.

Scaling path (recommended sequence):
- W1: 5 spatial-relation E/A predicates (the example input) — first
  follow-up PR
- W2: All ever/always spatial-relation predicates over tgeo_geo
  (~18 functions) — second follow-up PR
- W3: Distance functions over tgeo_geo and tgeo_tgeo (~30) — third
- W4: Scalar accessors that decompose to per-event reads — template
  extension required
- W5: Aggregations (windowed/cross-stream) — separate generator with
  the aggregation-specific 4-layer pattern

Stacks on PR MobilityDB#20. Tools-only; touches no operator code, no
CMakeLists, no parser/grammar.
Two adjacent compile-breakers found while validating the codegen output of
PR MobilityDB#21 against the latest mariana/main:

1. SerializableAggregationFunction proto declares only {type, on_field,
   as_field}. The 5 MEOS aggregations landing in MobilityDB#16/MobilityDB#17 read additional
   fields out of the proto (vidA/vidB/dMeet/...), so they need the extra
   field. Adds:

       repeated SerializableFunction extra_fields = 4;

   Backwards-compatible (tag 4, new repeated). Aggregations whose extra
   fields are absent continue to deserialize unchanged.

2. CrossDistance/PairMeeting/TemporalLength aggregations carry an unused
   PipelineMemoryProvider& parameter on lower(). Werror=-Wunused-parameter
   turns that into a build failure. Annotates the parameter [[maybe_unused]]
   at the call site — no behavior change, intent stays visible to readers
   who later wire memory into the lowering.

Verified locally on the mobilitynebula-v2 dev image (MEOS baked in):

    cmake --build build-w1 --target nes-physical-operators -j 4
    → [110/111] Linking libnes-physical-operators-registry.a
    → [111/111] Linking libnes-physical-operators.a

Stacks on MobilityDB#21 only because that is the active codegen branch where the
breakage surfaced; the diff itself is independent of any codegen output.
…geom)

First batch of MEOS operators generated by the PR MobilityDB#21 codegen, covering
the spatial-relation family over (tgeo, geometry). Five operators landed,
one per relation pattern:

    edisjoint_tgeo_geo  → TemporalEDisjointGeometry
    atouches_tgeo_geo   → TemporalATouchesGeometry
    ecovers_tgeo_geo    → TemporalECoversGeometry
    acontains_tgeo_geo  → TemporalAContainsGeometry
    etouches_tgeo_geo   → TemporalETouchesGeometry

Each operator is emitted at all four layers — logical .hpp/.cpp +
physical .hpp/.cpp — same shape mariana's hand-written eContainsGeometry
operator uses, so the runtime sees them as ordinary plugin operators
with no special wiring.

Generator tightenings landed alongside the output (kept inside
tools/codegen so they remain re-runnable):

  * physical Registrar reads PhysicalFunctionRegistryArguments.childFunctions
    (the actual field name; the previous template used .children which only
    exists on the logical side).
  * VariableSizedData is accessed through .getContent() / .getContentSize()
    (the real API; the previous template used .getRawByteRef() / .size()
    which do not exist).
  * The MEOS spatial-rel signature is 2-arg (Temporal*, GSERIALIZED*) —
    no trailing atstart bool. The 3-arg distance form lives only on
    edwithin_tgeo_geo and edwithin_tgeo_tgeo and stays out of W1.
  * tools/codegen/codegen_input.example.json now references real MEOS
    symbols (etouches_tgeo_geo, acontains_tgeo_geo). The earlier
    eoverlaps_tgeo_geo / acrosses_tgeo_geo entries were placeholders
    and would not link.

Verified locally on the mobilitynebula-v2 dev image (MEOS baked in):

    cmake --build build-w1 --target nes-logical-operators -j 4
    cmake --build build-w1 --target nes-physical-operators -j 4
    → both link clean. The 5 new operators compile and register at both
      layers.

Stacks on PR-A (proto extra_fields + Werror unused-param) and PR MobilityDB#21
(the codegen itself). The same generator scales to W2 (e/a spatial-rels
over tgeo × tgeo, ~10 ops) and W3 (distance functions over tgeo × geo +
tgeo × tgeo, ~30 ops) with no further template work — that is the path
the 9 BerlinMOD-query recipes open beyond the surface metric.
Adds the 2 remaining publicly-declared 2-arg spatial-rel ops over
(tgeo, geometry) not yet covered by W1 + mariana's seeds:

    adisjoint_tgeo_geo    → TemporalADisjointGeometry
    eintersects_tgeo_geo  → TemporalEIntersectsGeometry

Combined with the prior layers, the public-API _tgeo_geo spatial-rel
row is now complete for the 2-arg shape:

    e:  econtains  ecovers  edisjoint  eintersects  etouches   (5/5)
    a:  acontains  adisjoint  aintersects  atouches            (4/4)

Provenance per layer:
- mariana seeds: TemporalEContainsGeometry, TemporalAIntersectsGeometry,
  TemporalEDWithinGeometry (3-arg), TemporalIntersectsGeometry
- W1 (PR MobilityDB#23):   edisjoint, atouches, ecovers, acontains, etouches
- W2 (this PR):  adisjoint, eintersects

The 3-arg dwithin pair (edwithin / adwithin) is excluded from the
2-arg shape and stays out of this PR.

Note on acovers_tgeo_geo: the symbol exists in libmeos.so but has
no public declaration in meos_geo.h (libmeos-internal only), so it
is correctly out of scope for a binding-level PR.

Local verification on the mobilitynebula-v2 dev image:
    cmake --build build-w1 --target nes-physical-operators -j 4
      → [161/161] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [43/43] Linking libnes-logical-operators.a

Same generator, no template changes, just 2 more input rows — the
mechanical-scale path the 9 BerlinMOD-query recipes open.
… (9 ops)

Closes the public-API _tgeo_tgeo 2-arg spatial-relation row by emitting
all 9 publicly-declared ops as Nebula operators (one new op per relation,
per e/a quantifier). The MEOS signature is
`int fn(const Temporal*, const Temporal*)`, so each operator builds
TWO single-instant tgeompoints from event fields (lonA/latA/tsA +
lonB/latB/tsB) before invoking MEOS:

    econtains_tgeo_tgeo    → TemporalEContainsTGeometry
    ecovers_tgeo_tgeo      → TemporalECoversTGeometry
    edisjoint_tgeo_tgeo    → TemporalEDisjointTGeometry
    eintersects_tgeo_tgeo  → TemporalEIntersectsTGeometry
    etouches_tgeo_tgeo     → TemporalETouchesTGeometry
    acontains_tgeo_tgeo    → TemporalAContainsTGeometry
    adisjoint_tgeo_tgeo    → TemporalADisjointTGeometry
    aintersects_tgeo_tgeo  → TemporalAIntersectsTGeometry
    atouches_tgeo_tgeo     → TemporalATouchesTGeometry

The 3-arg dwithin pair (edwithin / adwithin) stays out — same as in
W1/W2, they belong to a separate distance-arg template branch.

Generator extension
-------------------

This is the first PR where the codegen ships a NEW template branch in
addition to new rows. Adds:

  * PHYSICAL_CPP_TEMPLATE_TWO_TEMPORAL_POINTS — mirrors the one-temporal-point
    template, but with two single-instant tgeompoints and no static
    geometry argument.
  * `build_two_temporal_points` boolean flag on operator descriptors,
    dispatched alongside `build_temporal_point` in `emit_operator`.

No existing template paths change. Row totals:

| family | _tgeo_tgeo (2-arg) ops in meos_geo.h | shipped |
|--------|--------------------------------------|---------|
| e/*    | econtains, ecovers, edisjoint, eintersects, etouches | 5/5 |
| a/*    | acontains, adisjoint, aintersects, atouches          | 4/4 |

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [38/38] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [52/52] Linking libnes-logical-operators.a

Both targets link clean on the first attempt — the template extension
worked without iteration, validating the generator approach for the next
shape (distance functions, 3-arg signature).
…mplates)

Closes the public-API distance-function row over (tgeo, geo) and
(tgeo, tgeo). Two distinct measure types, both built from the same
event-field shape used by W1/W2/W3:

Scalar measure — `nad_*` (nearest-approach distance, double return):
    nad_tgeo_geo    → TemporalNADGeometry
    nad_tgeo_tgeo   → TemporalNADTGeometry

Thresholded test — `*dwithin_*` (3-arg, int return):
    edwithin_tgeo_tgeo  → TemporalEDWithinTGeometry
    adwithin_tgeo_geo   → TemporalADWithinGeometry
    adwithin_tgeo_tgeo  → TemporalADWithinTGeometry

`edwithin_tgeo_geo` is already shipped as mariana's `TemporalEDWithinGeometry`
seed, so the (e/a × tgeo_geo/tgeo_tgeo) dwithin square is now complete.

Row totals after this PR (publicly-declared in meos_geo.h):

| shape                 | covered                |
|-----------------------|------------------------|
| nad_tgeo_geo          | 1/1 ✅                |
| nad_tgeo_tgeo         | 1/1 ✅                |
| edwithin_tgeo_geo     | 1/1 (mariana seed) ✅  |
| edwithin_tgeo_tgeo    | 1/1 ✅                 |
| adwithin_tgeo_geo     | 1/1 ✅                 |
| adwithin_tgeo_tgeo    | 1/1 ✅                 |

Generator extension
-------------------

Two new template branches; existing branches untouched:

  * PHYSICAL_CPP_TEMPLATE_TEMPORAL_POINT_WITH_DIST
    — one-tgeo + static geometry + trailing `double dist` (5 args).
  * PHYSICAL_CPP_TEMPLATE_TWO_TEMPORAL_POINTS_WITH_DIST
    — two-tgeo + trailing `double dist` (7 args).

Dispatch in `emit_operator` extends the existing if/elif chain with
`build_temporal_point_with_dist` and `build_two_temporal_points_with_dist`
flags. NAD reuses the existing temporal-point / two-temporal-points
branches with no template change — only `return_type="double"` and
`nautilus_return="FLOAT64"` differ at the operator-descriptor level.

Local verification on the mobilitynebula-v2 dev image:
    cmake --build build-w1 --target nes-physical-operators -j 4
      → [43/43] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [57/57] Linking libnes-logical-operators.a

Both targets link clean on the first attempt.
…0 dispatch cases)

Extends the codegen to back-fill the SQL-parser glue that the W1–W4
PRs (MobilityDB#23MobilityDB#26) shipped without — so the 21 generated operators become
SQL-invokable end-to-end instead of just runtime-registered plugins
waiting for manual wiring.

What the codegen now writes
---------------------------

After emitting the .hpp/.cpp files, the codegen idempotently injects
into the existing in-tree files:

  * nes-sql-parser/AntlrSQL.g4
    - lexer-token entries  (TOKEN: 'TOKEN' | 'token';) bracketed
      with /* BEGIN/END CODEGEN LEXER TOKENS */ marker
    - functionName: alternation list updated with new tokens
  * nes-sql-parser/src/AntlrSQLQueryPlanCreator.cpp
    - #include <Functions/Meos/XxxLogicalFunction.hpp> per op
    - case AntlrSQLLexer::TOKEN: { ... } dispatch block per op,
      bracketed with /* BEGIN/END CODEGEN PARSER GLUE: TOKEN */
  * nes-{logical,physical}-operators/src/Functions/Meos/CMakeLists.txt
    - add_plugin(NebulaName {Logical,Physical}Function ...) per op

Idempotency: every per-op injection skips when either the codegen
marker is present OR a pre-existing hand-written case (no marker) is
already in the file. Re-running the codegen on the same input is a
no-op for the parser side; only the .hpp/.cpp emitters re-write
deterministically.

Two opt-out CLI flags:
    --no-parser-glue     skip .g4 + parser .cpp injection
    --no-cmake-entries   skip CMakeLists.txt injection

Four dispatch-case templates by shape
-------------------------------------

  * one tgeo + static geom        (4 args:  lon, lat, ts, geom)
  * two tgeos                     (6 args:  lonA, latA, tsA, lonB, latB, tsB)
  * one tgeo + static geom + dist (5 args:  lon, lat, ts, geom, dist)
  * two tgeos + dist              (7 args:  lonA, latA, tsA, lonB, latB, tsB, dist)

The constantBuilder→functionBuilder lift mirrors mariana's pattern
from TGEO_AT_STBOX and EDWITHIN_TGEO_GEO (TRUE/FALSE → BOOLEAN,
strtod-clean → FLOAT64, else → VARSIZED), so distance literals and
WKT literals deserialize the same way the hand-written ops do.

Back-fill: 20 new dispatch cases + 21 includes + 20 lexer tokens
----------------------------------------------------------------

Ran the codegen against the combined W1+W2+W3+W4 input (21 ops). One
of the 21 (TEMPORAL_EINTERSECTS_GEOMETRY) was already wired manually
by mariana so the codegen detected and skipped it; 20 cases injected
clean. nes-sql-parser links green with the regenerated ANTLR lexer +
parser stubs.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-sql-parser  -j 4 → links clean
    cmake --build build-w1 --target nes-logical-operators  -j 4 → up to date
    cmake --build build-w1 --target nes-physical-operators -j 4 → up to date

What this unlocks
-----------------

The 21 W1–W4 operators are now SQL-invokable end-to-end. From now on,
every codegen PR ships parser glue in-PR by default (per the
`--no-parser-glue` opt-out, which is OFF by default). The path past
the spatial-rel surface (W5 tnumber scalar, W5b extended types, W7
aggregations) inherits the closed loop.
…stests)

First-batch tnumber-shape operators. The MEOS surface for nearest-approach
distance over tnumber types is small (4 publicly-declared ops in meos.h
beyond the TBox-arg variants, which are deferred):

    nad_tfloat_float  → TemporalNADFloatScalar    (3 args: value, ts, scalar)
    nad_tint_int      → TemporalNADIntScalar      (3 args: value, ts, scalar)
    nad_tfloat_tfloat → TemporalNADTFloat         (4 args: vA, tsA, vB, tsB)
    nad_tint_tint     → TemporalNADTInt           (4 args: vA, tsA, vB, tsB)

Single-instant tnumber construction uses MEOS's text constructor
`tfloat_in`/`tint_in` over a per-event WKT string "value@ts", mirroring
the existing tgeompoint pattern (where the WKT is built per record from
event fields and parsed by `temporal_in`). The constructed Temporal* is
freed after the MEOS call.

Generator additions
-------------------

Two new physical-cpp template branches + two new parser-glue dispatch-case
templates, all plumbed through emit_operator's existing flag dispatch:

  * PHYSICAL_CPP_TEMPLATE_TNUMBER_POINT_WITH_SCALAR
    — flag: build_tnumber_point_with_scalar
  * PHYSICAL_CPP_TEMPLATE_TWO_TNUMBER_POINTS
    — flag: build_two_tnumber_points
  * DISPATCH_CASE_TNUMBER_POINT_WITH_SCALAR     (3-arg dispatch)
  * DISPATCH_CASE_TWO_TNUMBER_POINTS            (4-arg dispatch)

Per-op extras in the JSON descriptor parameterize tnumber type (FLOAT64
or INT32) and the MEOS `*_in` constructor:
    "tnumber_value_cpp_type": "double" | "int32_t"
    "scalar_cpp_type":        "double" | "int32_t"
    "tnumber_in_fn":          "tfloat_in" | "tint_in"
    "tnumber_wkt_format":     "{}@{}"  (consumed by fmt::format at runtime)

Codegen anchor fix
------------------

The parser-dispatch anchor regex tuned for the pre-W4.5 layout
(TGEO_AT_STBOX → default:) no longer matched after W4.5 injected 20
cases between the two. New logic: insert just after the LAST
`/* END CODEGEN PARSER GLUE: ... */` marker if any exist (so successive
codegen runs cluster their cases), else fall back to the original
TGEO_AT_STBOX→default anchor.

Per-shape systests
------------------

Two new .test files in Tests/Functions/ — one per dispatch shape:

  * nad_tfloat_float.test    (one-tnumber + scalar; 3 rows; expected distance)
  * nad_tfloat_tfloat.test   (two-tnumbers; 3 rows; expected distance)

Per the testing-cadence directive: every codegen PR ships at least one
systest per dispatch shape it introduces.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [47/47] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [61/61] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first attempt — both new template
branches worked without iteration, and the parser-anchor fix is in
the generator so subsequent W5b/W6/W7 inherit it.
…mplate + 1 systest)

First restriction-shape operators. MEOS signature is
`Temporal* fn(const Temporal*, const GSERIALIZED*)` — returns the
clipped Temporal* (non-null if input survives the restriction, null
if clipped to empty).

For per-event single-instant inputs (the codegen's current shape), the
restriction collapses to a filter predicate: 1 if the point survives,
0 if clipped. This mirrors mariana's TemporalAtStBox int-collapse
pattern exactly — see TemporalAtStBoxPhysicalFunction.cpp:90 for the
hand-written precedent (`clipped.get() != nullptr ? 1 : 0`).

Operators
---------

    tgeo_at_geom    → TemporalAtGeometry      (4 args; survives if point inside the geom)
    tgeo_minus_geom → TemporalMinusGeometry   (4 args; survives if point outside the geom)

Honest semantic note
--------------------

Per-event single-instant TEMPORAL_AT_GEOMETRY is **semantically equivalent**
to TEMPORAL_ECONTAINS_GEOMETRY (PR MobilityDB#23), and TEMPORAL_MINUS_GEOMETRY ≡
TEMPORAL_EDISJOINT_GEOMETRY. The restriction ops only add genuinely new
SQL surface when the input tgeompoint is a *sequence* of multiple
instants (W7-territory — windowed aggregations), where clipping produces
a different sequence than the original. Shipped now because:

  1. They round out the SQL surface PostGIS / MobilityDB users expect
     (the `AT`/`MINUS` idiom is standard there).
  2. They exercise the codegen's first restriction-shape template, which
     W7 sequence-aggregated restriction will inherit.
  3. The collapse-to-int return matches mariana's TemporalAtStBox so
     downstream consumers see a consistent shape across at/minus ops.

Generator additions
-------------------

  * PHYSICAL_CPP_TEMPLATE_TEMPORAL_POINT_RESTRICTION
    — calls `Temporal* {meos_call}(...)`, checks non-null, frees, returns int.
    Flag: `build_temporal_point_restriction`.
  * dispatch_case_for() reuses the existing DISPATCH_CASE_ONE_TEMPORAL_POINT
    template — same 4-arg parser shape (lon, lat, ts, geom), only the
    physical-cpp body shape differs (`Temporal*` return vs `int` return).

Per-shape systest
-----------------

Tests/Functions/at_geometry.test exercises TEMPORAL_AT_GEOMETRY: one
point inside a polygon (expect 1), one outside (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [49/49] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [63/63] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first attempt.
…n/codegen_aggregations.py + 2 systests)

Companion to codegen_nebula.py: a separate generator targeting the
windowed-aggregation surface — MEOS scalar functions of the shape
`<scalar> fn(const Temporal*)` where the Temporal* is a per-(window,
group) sequence assembled across multiple events.

Operators
---------

3 tgeo-shape aggregations (lift = (lon, lat, ts), lower = trajectory):

    temporal_num_instants   → TemporalNumInstants
    temporal_num_sequences  → TemporalNumSequences
    temporal_num_timestamps → TemporalNumTimestamps

9 tnumber-shape aggregations (lift = (value, ts), lower = sequence):

    tfloat_start_value  → TemporalTFloatStartValue
    tfloat_end_value    → TemporalTFloatEndValue
    tfloat_min_value    → TemporalTFloatMinValue
    tfloat_max_value    → TemporalTFloatMaxValue
    tnumber_integral    → TemporalTNumberIntegral
    tint_start_value    → TemporalTIntStartValue
    tint_end_value      → TemporalTIntEndValue
    tint_min_value      → TemporalTIntMinValue
    tint_max_value      → TemporalTIntMaxValue

12 ops, at the 15-op-per-PR cap. Each op emits 4 layer files
(logical .hpp + .cpp, physical .hpp + .cpp) mirroring mariana's hand-written
TemporalLengthAggregation 1:1.

Why a separate generator
------------------------

Aggregations live in DIFFERENT directories from the per-event ops:
  * nes-{logical,physical}-operators/.../Aggregation*/  (this generator)
  * nes-{logical,physical}-operators/.../Functions/Meos/  (codegen_nebula.py)

They use a DIFFERENT base class (AggregationPhysicalFunction vs
PhysicalFunction), DIFFERENT parser dispatch (windowAggs accumulator
vs functionBuilder stack), and DIFFERENT registry. Keeping them in
separate generators preserves shape cohesion and matches the
in-tree directory split.

What the generator writes
-------------------------

Per op, 4 emitted code files (above), AND idempotent injection into 5
shared files:

  * nes-sql-parser/AntlrSQL.g4
        - lexer-token entries
        - functionName: alternation list
  * nes-sql-parser/src/AntlrSQLQueryPlanCreator.cpp
        - case AntlrSQLLexer::TOKEN: dispatch (dedicated-token switch)
        - else if (funcName == "TOKEN") dispatch (IDENTIFIER fallback chain)
  * nes-query-optimizer/src/RewriteRules/LowerToPhysical/
        LowerToPhysicalWindowedAggregation.cpp
        - if (name == "Xxx") block lowering logical → physical descriptor
  * nes-{logical,physical}-operators/.../Aggregation*/CMakeLists.txt
        - add_plugin(...) per layer

All injections are bracketed with
`/* BEGIN CODEGEN AGGREGATION GLUE: TOKEN ... */` markers so re-runs are
no-ops; pre-existing hand-written cases (mariana's TemporalLength,
PairMeeting, CrossDistance) are detected by raw token match and skipped.

Two lift-shape branches selected by descriptor.input_shape:
  * "tgeo"     — 3 fields per event; lower builds {Point(lon lat)@ts, ...}
                  parsed via MEOS::Meos::parseTemporalPoint.
  * "tnumber"  — 2 fields per event; lower builds {value@ts, ...} parsed
                  via tfloat_in or tint_in per descriptor.

Codegen target-naming convention
--------------------------------

Mariana's CMakeLists target name is the SQL aggregation name (e.g.
`TemporalLength`), NOT the C++ class basename (`TemporalLengthAggregation`).
The registry-codegen appends "Aggregation<RegistryKind>" to the target name,
so a target ending in "Aggregation" would yield a double-Aggregation
function symbol (caught here on the first build by linker error "did you
mean RegisterTemporalXXXAggregationAggregationLogicalFunction"). The
generator follows the mariana convention exactly.

Per-shape systests
------------------

  * Tests/Functions/temporal_num_instants.test     — tgeo aggregation
  * Tests/Functions/temporal_tfloat_max_value.test — tnumber aggregation

Per the testing-cadence directive: every codegen PR ships at least one
systest per dispatch shape it introduces.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-logical-operators  -j 4
      → [53/53] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-physical-operators -j 4
      → up to date
    cmake --build build-w1 --target nes-sql-parser         -j 4
      → [11/11] Linking libnes-sql-parser.a
    cmake --build build-w1 --target nes-query-optimizer    -j 4
      → up to date

All four targets link clean. The aggregation generator scales to any
single-Temporal*→scalar MEOS function by adding rows to the descriptor
JSON; new lift shapes (tcbuffer, tnpoint, tpose, …) need new template
branches following the tgeo/tnumber pattern.
…nical row-add)

Three more tnumber-shape aggregations fitting the existing W7 generator
templates exactly — no template work, only new descriptor rows. Validates
that the W7 aggregation generator scales by JSON-row addition for any
new single-Temporal*->scalar MEOS function with no further code change.

    tfloat_avg_value   → TemporalTFloatAvgValue
    tnumber_twavg      → TemporalTNumberTwAvg   (time-weighted average, tfloat input)
    tnumber_avg_value  → TemporalTIntAvgValue   (any-numeric MEOS fn applied via tint_in lift)

Note: tnumber_avg_value accepts any numeric Temporal* (tfloat or tint).
Wrapped via the tint_in lift to round out the tint side of the average
family; the tfloat side uses the type-specific tfloat_avg_value.

Per-shape systest
-----------------

Tests/Functions/temporal_tnumber_twavg.test — exercises TwAvg with a
known weighted-mean computation across 3+2 events per group.

No new shape is introduced (this PR adds rows to the existing tnumber-
aggregation shape covered by W7's temporal_tfloat_max_value.test), so
the single twavg systest is supplementary rather than per-shape-required.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-logical-operators -j 4
      → [59/59] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-physical-operators -j 4
      → up to date
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a
    cmake --build build-w1 --target nes-query-optimizer -j 4
      → up to date

All four targets link clean on the first build.
… ops; bool+int64)

Five more tgeo-shape aggregations on the existing W7 template, exercising
two RETURN types the generator had not yet emitted (bool and int64).
Validates that the generator handles all four MEOS scalar return types
(int32, double, int64, bool) with zero template change — only new
descriptor rows in the JSON.

    temporal_start_timestamptz → TemporalStartTimestamp   (int64, TimestampTz)
    temporal_end_timestamptz   → TemporalEndTimestamp     (int64, TimestampTz)
    temporal_lower_inc         → TemporalLowerInc         (bool)
    temporal_upper_inc         → TemporalUpperInc         (bool)
    tpoint_is_simple           → TemporalTPointIsSimple   (bool)

All five use the existing tgeo lift shape (lon, lat, ts). The bool
and int64 final-stamp types map directly to the Nautilus val<>
templated wrapper without any template modification.

Per-shape systest
-----------------

Tests/Functions/temporal_tpoint_is_simple.test — exercises the bool
return path with one simple trajectory (expect TRUE) and one self-
intersecting trajectory (expect FALSE).

No new lift/dispatch shape is introduced; the systest is added to
demonstrate the BOOLEAN return type actually executes correctly
(belt-and-suspenders for the first PR exercising it).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-logical-operators -j 4
      → [69/69] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-physical-operators -j 4
      → up to date
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a
    cmake --build build-w1 --target nes-query-optimizer -j 4
      → up to date

All four targets link clean on the first build.
…plate + 1 systest)

First extended-type batch: tcbuffer (circular buffer = point + radius)
spatial-relations against a static geometry. New 5-arg lift shape
(lon, lat, radius, ts, geometry) extends the codegen to its third
primitive Temporal* family beyond tgeo and tnumber.

    econtains/ecovers/edisjoint/eintersects/etouches _tcbuffer_geo (5 e-ops)
    acontains/acovers/adisjoint/aintersects/atouches _tcbuffer_geo (5 a-ops)

Per-event tcbuffer is constructed via tcbuffer_in() with WKT format
`Cbuffer(Point(lon lat),radius)@ts` (format confirmed by probing the
MEOS library directly). The Temporal* is freed after the MEOS call.

Generator additions
-------------------

One new physical-cpp template branch + one new dispatch-case template;
existing branches untouched:

  * PHYSICAL_CPP_TEMPLATE_TCBUFFER_POINT
    — 5 args: lon, lat, radius, ts, geometry. Calls
      `int {meos_call}(const Temporal*, const GSERIALIZED*)`.
  * DISPATCH_CASE_TCBUFFER_POINT
    — 5-arg parser dispatch with geometry lift as VARSIZED.
  * `build_tcbuffer_point` flag dispatch in emit_operator + dispatch_case_for.

Coverage scope
--------------

W10 covers ONLY the tcbuffer × geo 2-arg spatial-rel row (5 e + 5 a = 10
ops). The publicly declared tcbuffer surface in meos_cbuffer.h includes
more variations (tcbuffer × cbuffer, tcbuffer × tcbuffer, plus the
3-arg dwithin family), each requiring its own template branch and lift
shape. Those follow as future PRs per the ≤15-ops-per-PR cap.

Extended-types coverage at this PR:
  * tcbuffer × geo  (2-arg): 10/10 ✅
  * tcbuffer × cbuffer  (2-arg): 0/10 (separate template)
  * tcbuffer × tcbuffer (2-arg): 0/9  (separate template, 8-arg lift)
  * tcbuffer dwithin    (3-arg): 0/6  (separate template per shape)

Note on tnpoint / tpose
-----------------------

Probing meos_npoint.h and meos_pose.h showed those families have NO
publicly declared spatial-rel ops — their non-tcbuffer surface is
restriction (at/minus), distance (tdistance), and nad. Those are
follow-up PRs, not part of W10.

Per-shape systest
-----------------

Tests/Functions/econtains_tcbuffer_geo.test — one tcbuffer with
radius 10 covering its own center point (expect 1), one with
radius 0.0001 vs a far point (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [59/59] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [73/73] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first build.
…ew template + 1 systest)

Second tcbuffer batch. The static second arg is now a Cbuffer literal
(parsed via cbuffer_in) instead of a geometry (parsed as WKT).

    econtains/ecovers/edisjoint/eintersects/etouches _tcbuffer_cbuffer (5 e-ops)
    acontains/acovers/adisjoint/aintersects/atouches _tcbuffer_cbuffer (5 a-ops)

The per-event tcbuffer construction is identical to W10 (5-arg lift:
lon, lat, radius, ts, blob). Only the blob-parser differs:

  W10: cbuffer literal as VARSIZED WKT geometry → MEOS::Meos::StaticGeometry
  W11: cbuffer literal as VARSIZED WKT cbuffer  → cbuffer_in() → Cbuffer*

So the dispatch case (5-arg SQL parser shape) is REUSED — only the
physical-cpp body differs.

Generator additions
-------------------

  * PHYSICAL_CPP_TEMPLATE_TCBUFFER_POINT_CBUFFER — new physical template
    with cbuffer_in() second-arg parser and {meos_call}(Temporal*, Cbuffer*)
    call signature.
  * `build_tcbuffer_point_cbuffer` flag dispatch in emit_operator.
  * dispatch_case_for collapses build_tcbuffer_point and
    build_tcbuffer_point_cbuffer to the same DISPATCH_CASE_TCBUFFER_POINT
    (identical 5-arg SQL shape, only the physical-cpp body differs).

PR-awareness correction
-----------------------

A prior version of W10's PR body (MobilityDB#33) incorrectly stated that tnpoint
and tpose have no spatial-rels in the public MEOS API. That claim was
made against the vcpkg-baked MEOS in this dev image, which lags upstream
MobilityDB master. The retraction is now visible in MobilityDB#33's body. The
upstream-master substrate for tnpoint / tpose spatial-rel parity is in
open MobilityDB PRs:

  #987  Close tpose parity gap with spatial functions, analytics, and tile
        via tgeompoint composition
  #1082 Add the tnpoint typed value accessors to the MEOS public API
  #1083 tcbuffer + tpose typed value constructors
  #1084 tcbuffer/tnpoint/tpose from-base time constructors
  #1085 Export tpose_from_mfjson to MEOS public API

Those substrates feed W12+ (tcbuffer × tcbuffer + tnpoint and tpose
batches).

Per-shape systest
-----------------

Tests/Functions/econtains_tcbuffer_cbuffer.test — one self-intersection
case (expect 1) and one far-apart case with tiny radii (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [69/69] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [83/83] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first build.
…ps + 8-arg lift + 1 systest)

Third tcbuffer batch. Two per-event tcbuffer instants are built from
(lonA, latA, radiusA, tsA) and (lonB, latB, radiusB, tsB) and passed
to MEOS `int fn(const Temporal*, const Temporal*)`. New 8-arg lift
shape.

    adisjoint/aintersects/atouches _tcbuffer_tcbuffer (3 a-ops)
    ecovers/eintersects/etouches   _tcbuffer_tcbuffer (3 e-ops)

Total 6 publicly-declared 2-arg ops. econtains/edisjoint/acovers/acontains
are NOT publicly declared on _tcbuffer_tcbuffer; covered by extended
coverage rows omitted.

Generator additions
-------------------

  * PHYSICAL_CPP_TEMPLATE_TWO_TCBUFFER_POINTS
    — 8 args. Two tcbuffer_in() per-event constructions.
  * DISPATCH_CASE_TWO_TCBUFFER_POINTS
    — 8-arg parser dispatch (no constants).
  * `build_two_tcbuffer_points` flag dispatch in emit_operator + dispatch_case_for.

Per-shape systest
-----------------

Tests/Functions/eintersects_tcbuffer_tcbuffer.test — overlapping
tcbuffers (expect 1) vs non-overlapping (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [75/75] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [89/89] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first build.

Coverage scope
--------------

Tcbuffer 2-arg spatial-rels coverage at this PR:

  tcbuffer × geo      (2-arg): 10/10 ✅ (W10)
  tcbuffer × cbuffer  (2-arg): 10/10 ✅ (W11)
  tcbuffer × tcbuffer (2-arg): 6/6 ✅ (this PR — missing 4 ops are not declared)
  tcbuffer × {geo, cbuffer, tcbuffer} dwithin (3-arg): 6 ops pending future PR

tnpoint and tpose spatial-rel coverage gated on upstream MobilityDB
PRs (#987, #1082-#1085) reaching this dev image's vcpkg-baked MEOS;
see MobilityDB#33's RETRACTION section for the substrate map.
…t templates + 1 systest)

Closes the in-image-MEOS tcbuffer surface. Adds the 3-arg dwithin
variants across all three tcbuffer × {geo, cbuffer, tcbuffer} sub-shapes,
each with a trailing double distance threshold:

    edwithin_tcbuffer_geo      → TemporalEDWithinTCbufferGeometry  (6-arg)
    adwithin_tcbuffer_geo      → TemporalADWithinTCbufferGeometry  (6-arg)
    edwithin_tcbuffer_cbuffer  → TemporalEDWithinTCbufferCbuffer   (6-arg)
    adwithin_tcbuffer_cbuffer  → TemporalADWithinTCbufferCbuffer   (6-arg)
    edwithin_tcbuffer_tcbuffer → TemporalEDWithinTCbufferTCbuffer  (9-arg)
    adwithin_tcbuffer_tcbuffer → TemporalADWithinTCbufferTCbuffer  (9-arg)

Generator additions
-------------------

Three new physical-cpp template branches (one per sub-shape) + two
new dispatch case templates (the with-dist 6-arg dispatch is shared
across geo and cbuffer because the parser shape is identical — only
the physical-cpp blob-parser differs):

  * PHYSICAL_CPP_TEMPLATE_TCBUFFER_POINT_WITH_DIST
  * PHYSICAL_CPP_TEMPLATE_TCBUFFER_POINT_CBUFFER_WITH_DIST
  * PHYSICAL_CPP_TEMPLATE_TWO_TCBUFFER_POINTS_WITH_DIST
  * DISPATCH_CASE_TCBUFFER_POINT_WITH_DIST            (shared 6-arg)
  * DISPATCH_CASE_TWO_TCBUFFER_POINTS_WITH_DIST       (9-arg)
  * `build_tcbuffer_point_with_dist`, `build_tcbuffer_point_cbuffer_with_dist`,
    `build_two_tcbuffer_points_with_dist` flag dispatch.

Per-shape systest
-----------------

Tests/Functions/edwithin_tcbuffer_tcbuffer.test — overlapping pair
(expect 1) vs far-apart pair (expect 0) at threshold 2.0.

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [81/81] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [95/95] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first build.

Coverage scope — tcbuffer surface closed for this dev image
-----------------------------------------------------------

After W13 the entire in-image-MEOS tcbuffer 2-arg-and-3-arg spatial-rel
surface is closed:

  tcbuffer × geo      (2-arg, W10): 10/10 ✅
  tcbuffer × cbuffer  (2-arg, W11): 10/10 ✅
  tcbuffer × tcbuffer (2-arg, W12): 6/6 ✅ (4 ops not publicly declared)
  tcbuffer × {geo, cbuffer, tcbuffer} dwithin (this PR): 6/6 ✅

Total tcbuffer ops shipped this session: 32.

tnpoint and tpose spatial-rel coverage remain gated on upstream
MobilityDB PRs (#987, #1082-#1085); see MobilityDB#33's RETRACTION section
and MobilityDB#34's PR-awareness note.
… ops + 1 template + 1 systest)

Closes the tpose × geo spatial-rel parity gap at the Nebula binding
layer using the SAME composition recipe MobilityDB PR #987 uses at the
SQL layer: convert the temporal pose to a temporal geometry point, then
apply the existing _tgeo_geo spatial-rel.

Correcting the record: an earlier W10 PR body (MobilityDB#33) claimed tpose has no
spatial-rels in the public MEOS API. That was wrong — checking open PRs
(per the always-check-PRs rule) shows MobilityDB #987 closes tpose
parity, and the composition primitive tpose_to_tpoint() is ALREADY in
this dev image's MEOS public API. No dev-image rebuild was needed.

    econtains/ecovers/edisjoint/eintersects/etouches via tpose→tgeo (5 e-ops)
    acontains/adisjoint/aintersects/atouches via tpose→tgeo (4 a-ops)

(acovers_tgeo_geo is not publicly declared, so ACovers is correctly out
of scope — same gap noted in W2 for tgeo × geo.)

Composition path (per event):
    Pose(Point(x y), theta)@ts  --tpose_in-->  Temporal* (tpose)
                                --tpose_to_tpoint-->  Temporal* (tgeompoint)
                                --{meos_call}(tgeo, gs)-->  int
Both Temporal* freed after the call.

Generator additions
-------------------

  * PHYSICAL_CPP_TEMPLATE_TPOSE_POINT_VIA_COMPOSITION
    — 5 args (x, y, theta, ts, geometry); builds tpose, converts via
      tpose_to_tpoint, calls the existing _tgeo_geo spatial-rel.
  * `build_tpose_point_via_composition` flag dispatch.
  * dispatch_case_for collapses tcbuffer-point / tcbuffer-cbuffer /
    tpose-composition to the same 5-arg DISPATCH_CASE_TCBUFFER_POINT
    (identical SQL shape: 3 doubles + ts + blob).

Per-shape systest
-----------------

Tests/Functions/econtains_tpose_geo.test — tpose at a point contained
by an identical static point (expect 1) vs a far point (expect 0).

Local verification on the mobilitynebula-v2 dev image:

    cmake --build build-w1 --target nes-physical-operators -j 4
      → [90/90] Linking libnes-physical-operators.a
    cmake --build build-w1 --target nes-logical-operators -j 4
      → [104/104] Linking libnes-logical-operators.a
    cmake --build build-w1 --target nes-sql-parser -j 4
      → [11/11] Linking libnes-sql-parser.a

All three targets link clean on the first build.

This composition recipe generalizes: tpose × tpose, tnpoint × geo,
tnpoint × tnpoint all follow the same convert-then-delegate pattern
(tnpoint_to_tgeompoint is likewise already in the public API).
…(9 ops + 1 template + 1 systest)

Completes the tpose family started in W14 (MobilityDB#37): pairs two single-instant
tposes against each other (8 args) instead of one tpose against a static
geometry (5 args). Each tpose is lifted to a tgeompoint via
tpose_to_tpoint at run time, then the existing _tgeo_tgeo spatial-rel
(shipped in W3, MobilityDB#25) is applied — no new MEOS symbols, no dev-image
rebuild.

9 e/a operators: e{contains,covers,disjoint,intersects,touches} +
a{contains,disjoint,intersects,touches}. acovers_tgeo_tgeo is not
publicly declared, so ACovers is out of scope (same gap as W3/W14).

Generator additions:
- PHYSICAL_CPP_TEMPLATE_TWO_TPOSE_POINTS_VIA_COMPOSITION (8-arg two-tpose body)
- build_two_tpose_points_via_composition flag dispatch
- DISPATCH_CASE_TWO_TPOSE_POINTS (8-arg parser case)

Systest: Tests/Functions/eintersects_tpose_tpose.test.

Local verification (nes-development:mobilitynebula-v2): nes-physical-operators,
nes-logical-operators, nes-sql-parser all link clean.
…s + 2 templates + 1 systest)

Unblocks the tnpoint family on NebulaStream. tnpoint composes per-event
exactly like tpose (W14/W15): tnpoint_in -> tnpoint_to_tgeompoint -> the
existing _tgeo_geo / _tgeo_tgeo spatial-rels (W2/W3). The route-geometry
lookup inside tnpoint_to_tgeompoint goes through MEOS's per-thread TLS
ways cache, so the operator carries no network state — the codegen shape
is identical to the other composition waves.

18 ops: 9 tnpoint x geo (_tgeo_geo) + 9 tnpoint x tnpoint (_tgeo_tgeo),
each the e/a set e{contains,covers,disjoint,intersects,touches} +
a{contains,disjoint,intersects,touches}. acovers is not publicly
declared (out of scope, as in W2/W3/W14/W15).

Generator additions:
- PHYSICAL_CPP_TEMPLATE_TNPOINT_POINT_VIA_COMPOSITION (4 args: rid,
  fraction, ts, geometry) + build_tnpoint_point_via_composition
- PHYSICAL_CPP_TEMPLATE_TWO_TNPOINT_POINTS_VIA_COMPOSITION (6 args:
  ridA, fractionA, tsA, ridB, fractionB, tsB) +
  build_two_tnpoint_points_via_composition
- dispatch_case_for reuses the 4-arg / 6-arg parser cases by arity.

Systest: Tests/Functions/eintersects_tnpoint_tnpoint.test.

Runtime note: tnpoint_to_tgeompoint reads the ways network from
/usr/local/share/ways1000.csv (MEOS default path). That file must be
present to run tnpoint queries (a copy ships in MobilityDB at
meos/examples/data/ways1000.csv). tnpoint_to_tgeompoint yields a
tgeompoint in the network SRID, so tnpoint x static-geometry needs the
geometry in that SRID; tnpoint x tnpoint is unaffected.

Local verification (nes-development:mobilitynebula-v2): nes-physical-operators,
nes-logical-operators, nes-sql-parser all link clean.
…nce) (4 ops + 1 systest)

Adds nearest-approach distance for the tpose and tnpoint families,
completing their distance-measure surface alongside the spatial-rels
(W14/W15/W18). No new generator template: nad has the same
(Temporal*, ...) -> scalar shape as the spatial-rels, so the existing
composition templates are reused with a double (FLOAT64) return — exactly
as the tgeo nad ops (TemporalNADGeometry / TemporalNADTGeometry) already
do.

4 ops: TemporalNAD{TPoseGeometry,TPoseTPose,TNpointGeometry,TNpointTNpoint}
calling nad_tgeo_geo / nad_tgeo_tgeo. tpose resolves via tpose_to_tpoint,
tnpoint via tnpoint_to_tgeompoint (network SRID; needs the ways CSV at
run time, same as W18).

Systest: Tests/Functions/nad_tpose_tpose.test (identical tposes -> 0).

Local verification (nes-development:mobilitynebula-v2): nes-physical-operators,
nes-logical-operators, nes-sql-parser all link clean.
…s + 1 systest)

Completes the tpose and tnpoint distance surface (with nad in W19 and the
spatial-rels in W14/W15/W18). tpose/tnpoint resolve to tgeompoints via
tpose_to_tpoint / tnpoint_to_tgeompoint, then the existing 3-arg
edwithin/adwithin _tgeo_geo / _tgeo_tgeo calls run with the query-level
distance constant.

8 ops: Temporal{E,A}DWithin{TPoseGeometry,TPoseTPose,TNpointGeometry,TNpointTNpoint}.

Generator: 4 new with-dist composition templates (the W14/W15/W18 bodies
plus a trailing double dist forwarded to the MEOS call) + build_* flags.
No new parser dispatch — dispatch_case_for reuses the existing with-dist
dispatches by arity/constant pattern (distance is a lifted SQL constant;
tpose×geo/tpose×tpose match the 6-arg/9-arg tcbuffer-with-dist cases,
tnpoint the 5-arg/7-arg tgeo cases).

Systest: Tests/Functions/edwithin_tpose_tpose.test.

Local verification (nes-development:mobilitynebula-v2): nes-physical-operators,
nes-logical-operators, nes-sql-parser all link clean.
… ops + 1 systest)

Rounds out the tcbuffer distance surface (spatial-rels W10-W12, dwithin
W13). Like the tpose/tnpoint nad (W19), no new generator template: nad has
the same (Temporal*, ...) -> scalar shape as the tcbuffer spatial-rels, so
the existing tcbuffer templates are reused with a double (FLOAT64) return.

3 ops: TemporalNADTCbuffer (nad_tcbuffer_geo), TemporalNADTCbufferCbuffer
(nad_tcbuffer_cbuffer), TemporalNADTCbufferTCbuffer (nad_tcbuffer_tcbuffer).
nad_tcbuffer_stbox deferred with the other TBox-arg variants.

Systest: Tests/Functions/nad_tcbuffer_tcbuffer.test (identical tcbuffers -> 0).

Local verification (nes-development:mobilitynebula-v2): nes-physical-operators,
nes-logical-operators, nes-sql-parser all link clean.
@estebanzimanyi estebanzimanyi force-pushed the feat/nebula-streaming-parity-harness branch 2 times, most recently from fc9eedb to afb102d Compare May 22, 2026 14:40
…245->249)

Add the four single-field windowed extent aggregates that fold a scalar field
directly through a MEOS extent transition fn (no trajectory string, no parse)
and serialize via the external typed span wrappers:

  - FLOAT_EXTENT      float_extent_transfn       -> FLOATSPAN  -> floatspan_out
  - INT_EXTENT        int_extent_transfn         -> INTSPAN    -> intspan_out
  - BIGINT_EXTENT     bigint_extent_transfn      -> BIGINTSPAN -> bigintspan_out
  - TIMESTAMPTZ_EXTENT timestamptz_extent_transfn -> TSTZSPAN   -> tstzspan_out

New PHYSICAL_CPP_SCALARFOLD template reuses the tnumber (value, ts) HPP / ctor /
lift / combine / reset / cleanup verbatim; only lower() differs — the Span state
threads across events as an opaque pointer (NULL initial -> span_make on first,
span_expand in place after; one allocation, freed after serialize). Logical
layer + 2-arg parser glue + optimizer lowering reused unchanged
(final_stamp_type=VARSIZED). TIMESTAMPTZ_EXTENT converts the epoch field to
TimestampTz arithmetically.

Per [internal vs external API] the typed *span_out wrappers are used, not the
Datum-generic span_out (meos_internal.h); the operators include only meos.h.

Locally compile-verified (build_local.sh, EXIT=0). A systest per operator
exercises it end-to-end (rides CI's sanitizer/leak matrix); expected span text
captured from a faithful MEOS probe — note MEOS canonicalizes integer spans to
half-open upper+1 ([9,18]->[9,19), [9e8,1.8e9]->[...,1800000001)).

Feed: float/int/bigint/timestamptz_extent_transfn now wired (249/1945).
…249->304)

Wire the position/topological predicate family between a temporal and an
STBox/TBox query literal, both argument orders:
  - 21 temporal-first  left_tspatial_stbox(temp, box), overlaps_tnumber_tbox, …
  - 34 box-first        above_stbox_tspatial(box, temp), after_tbox_tnumber, …
Operations: left/right/above/below/front/back, before/after, the over*
half-predicates, adjacent/contains/contained/overlaps/same — all bool, over
tgeompoint (tspatial) and tfloat (tnumber).

Three surgical generator changes, reusing the proven per-event box-literal
assembler (no new templates):
  - build_descriptor: map the abstract `tspatial` token -> tgeompoint builder
    (unblocks the 21 tspatial-first; tnumber-first were already wired in W26).
  - build_descriptor.temporal_x_box: accept the box-first form (box, Temporal*),
    taking the box parser from the C arg type (STBox->stbox_in / TBox->tbox_in;
    bare Span* stays suffix-resolved to avoid tstzspan/numspan ambiguity), and
    flag box_first.
  - codegen_nebula.assemble_generic_physical: when box_first, emit the literal
    before the temporal in the MEOS call.

Locally compile-verified (build_local.sh, EXIT=0, 711 targets). Call order
inspection-verified: above_stbox_tspatial(arg0B, temp) vs
above_tspatial_stbox(temp, arg0B). Parser glue reused from W26's box-literal
path. Feed: +55 (304/1945).
…>309)

FLOAT_UNION / INT_UNION / BIGINT_UNION / TIMESTAMPTZ_UNION: collect a window's
values into a deduplicated, sorted Set. Same scalar-fold mechanism as W28 but
the per-event *_union_transfn accumulates a Set state (not a Span), finalized
by set_union_finalfn into the canonical Set, serialized via the external typed
wrappers floatset_out / intset_out / bigintset_out / tstzset_out.

PHYSICAL_CPP_SETFOLD is derived from PHYSICAL_CPP_SCALARFOLD by an asserted
swap of only the serialize lambda (Set state + finalfn); the fold loop / lift /
combine / reset / cleanup stay byte-identical. Descriptor adds fold:"set" +
finalfn. TIMESTAMPTZ_UNION converts the epoch field to TimestampTz.

Locally compile-verified (build_local.sh, EXIT=0). Systest per operator with
probe-captured expected text (dedup+sort confirmed: float {12.5,18,9.25,12.5}
-> {9.25, 12.5, 18}). adapters/nebula.py token regex recognizes *_UNION.
Feed: +5 (4 union transfns + set_union_finalfn) = 309/1945.
…egates

Add a `repeated string literals` slot to SerializableAggregationFunction so an
aggregate's query-literal constants (a windowed box/span/set predicate's
threshold operand; a meeting distance; a vid pair) survive plan serialization
instead of falling back to a hard-coded default.

- grpc/SerializableVariantDescriptor.proto: literals field (tag 5).
- AggregationLogicalFunctionRegistryArguments: std::vector<std::string> literals.
- FunctionSerializationUtil: populate args.literals from the proto in the
  TemporalSequence-shaped deserialize path (every MEOS aggregate uses it).

Additive and behavior-preserving for existing field-only aggregates (no
literals). Foundation for the windowed-extent predicate family (box/span/set
op against a query literal) and a follow-up that closes the PAIR_MEETING.dMeet /
CROSS_DISTANCE (vidA,vidB) round-trip gaps. Locally compile-verified
(build_local.sh, EXIT=0; proto regenerates the literals accessors).
Establish the efficient mechanism for windowed trajectory operators: instead of
a bespoke aggregate per MEOS function (each re-materializing the trajectory),
the per-group mini-trip trajectory is a first-class VARSIZED value (hex-WKB),
and the MEOS function library composes over it as stateless scalar operators —
the trajectory analogue of "all scalar functions compose on a float in the
window". One materialization, free composition; mirrors how the Flink/Kafka
JVM facade exposes the library over MEOS values.

codegen_nebula gains a `wkb_temporal` generic input: the operand is an upstream
VARSIZED hex-WKB MEOS value parsed via temporal_from_hexwkb, not a temporal
rebuilt from per-event scalar fields. Proof operator TPOINT_LENGTH_WKB applies
tpoint_length to such a value; trajectory functions like length are meaningful
only over the full trajectory the materialization provides, not a per-event
instant.

Locally compile-verified (build_local.sh, EXIT=0). Systest feeds a known
trajectory's hex-WKB and asserts its length (probe-confirmed lossless round-trip
temporal_from_hexwkb -> tpoint_length = 0.0131538303422). Feed unchanged
(tpoint_length already wired by the W16 aggregate); this is the composable
realization + the foundation for rolling the library over WKB values, plus a
windowing aggregate that emits the trajectory WKB.
…-WKB value

The value-producing half of the compose-over-values mechanism: a windowed
aggregate that materializes the per-group mini-trip as a SEQUENCE ([...], linear
interpolation) and emits its hex-WKB. This is the trajectory value the MEOS
function library composes over — its output is exactly the input that
TPOINT_LENGTH_WKB (and any wkb_temporal-input operator) consumes.

codegen_aggregations gains return_mode "wkb": derived from the tgeo scalar
template by swapping the empty-window write, the instant-set braces for sequence
brackets, and the finalize (temporal_as_hexwkb instead of extent+serialize) —
same asserted-swap pattern as the box-output mode.

Locally compile-verified (build_local.sh, EXIT=0). Systest asserts the exact
hex-WKB of a 3-point windowed trajectory (probe-matched), which is byte-identical
to the TPOINT_LENGTH_WKB systest input -> the two operators compose end-to-end
(TRAJECTORY_WKB -> tpoint_length = 0.0131538303422). Feed unchanged (value
producer via io-meta temporal_as_hexwkb; the composability substrate, not a
coverage symbol).
…->311)

The MEOS-native streaming aggregation, per the converged design: the aggregate
STATE is a live expandable Temporal* (a mini-trip trajectory) grown in place per
event via appendInstant; lower() applies the invariant MEOS scalar fn DIRECTLY
to the live trajectory — no per-event string build, no parse-the-window, no WKB.
This is how PG MobilityDB aggregates run; WKB is needed only when a value
crosses an operator boundary (the Flink/Kafka checkpointed-state form).

codegen_aggregations gains return_mode "expand" (PHYSICAL_CPP_TGEO_EXPAND): state
= Temporal* slot; lift builds an instant (tgeompoint_in, public) and
temporal_append_tinstant(..., expand=true) — doubles maxcount on append, so
amortized-O(1) without the internal pre-allocator (tsequence_make_exp /
tgeompointinst_in are internal and warrant promotion to the public API — a
MobilityDB MEOS-C follow-up); combine merges via temporal_merge; lower applies f;
cleanup frees. Double-pointer params are non-const in MEOS — cast (TInstant**),
never const.

Proof operator TLENGTH_EXP (tpoint_length over the live mini-trip). Locally
compile-verified (EXIT=0); systest asserts the length, probe-confirmed identical
across the expandable, string-parse, and WKB constructions (0.0131538303422).
Feed +2: temporal_append_tinstant + temporal_merge now wired (the streaming
primitives the expandable accumulator uses, which the PagedVector path did not).

Doc: methodology gains a "Running-aggregation realization" section contrasting
Flink/Kafka (WKB-serialized checkpointed state + JMEOS facade) with NebulaStream
(in-process expandable Temporal* + direct API) — same scope, different state model.
…11->316)

Complete the mechanism toolkit: a value-OUTPUT finalize for the expandable
substrate — f(live mini-trip) -> Temporal* result, serialized to hex-WKB as
VARSIZED (the proven box-output VARSIZED tail). The MEOS library's
Temporal-returning single-temporal transforms become windowed aggregates over
the expandable trajectory, the per-event path could not emit (it only returned
scalars).

codegen_aggregations gains return_mode "expand_wkb" (PHYSICAL_CPP_TGEO_EXPAND_WKB,
derived from the expand template by an asserted swap of only the lower()).
Wires TGEO_CENTROID / TPOINT_AZIMUTH / TPOINT_ANGULAR_DIFFERENCE /
TGEOMPOINT_TO_TGEOMETRY / TEMPORAL_COPY over the windowed mini-trip.

Locally compile-verified (EXIT=0). Systest TEMPORAL_COPY_EXP asserts the
result hex-WKB (probe-confirmed: the expandable tsequence_make+appendInstant
sequence serializes byte-identically to TRAJECTORY_WKB). Feed +5
(tgeo_centroid, tpoint_azimuth, tpoint_angular_difference,
tgeompoint_to_tgeometry, temporal_copy) = 316/1945.
Extend the expandable value-output substrate to the tnumber input shape: the
per-event instant is a tfloat ("value@ts" via tfloat_in), accumulated by
appendInstant into the expandable Temporal*; lower() applies the invariant fn
and serializes the result temporal to hex-WKB. Derived from the tgeo expand-wkb
template by swapping only the ctor + lift (the Temporal*-slot lower / reset /
cleanup / value-output finalize are input-shape-independent).

Wires the tnumber Temporal-returning transforms over the windowed tfloat series:
TNUMBER_ABS / TNUMBER_DELTA_VALUE / TNUMBER_ANGULAR_DIFFERENCE /
TEMPORAL_DERIVATIVE / TEMPORAL_AT_MAX / TEMPORAL_AT_MIN / TEMPORAL_MINUS_MAX /
TEMPORAL_MINUS_MIN.

Locally compile-verified (EXIT=0). Systest TNUMBER_ABS_EXP asserts the
result hex-WKB (probe-confirmed). Feed +8 = 324/1945.
Every generated MEOS windowed-aggregate operator failed at query-plan
deserialization on the worker, so none of their systests actually executed
end-to-end. Three root causes, each surfaced by running the systests against a
locally-built single-node worker:

1. Serialized type vs registry key mismatch. The logical function serialized
   set_type(NAME) with NAME = the SQL token (e.g. "TLENGTH_EXP"), but the
   registry key is the add_plugin target = the PascalCase operator name
   ("TLengthExp"). deserializeWindowAggregationFunction therefore called
   create("TLENGTH_EXP") against a registry keyed by "TLengthExp" and threw
   UnknownLogicalOperator. The optimizer-lowering match (name == "...") had the
   same SQL-token spelling, so even a fixed NAME would not have lowered.
   Fix: NAME and the optimizer match now use the PascalCase name (matching the
   built-in Count/TemporalLength convention); the SQL spelling stays in the
   lexer/parser. codegen normalizes class_name_token = nebula_name so a spec
   value cannot reintroduce the divergence.

2. Two-field (value, timestamp) arity. serializeTemporalSequence only has a
   four-field (lon, lat, ts, as) form, so the tnumber shape packs the value
   field twice [value, ts, value, as]; the registrar required exactly three and
   threw CannotDeserialize ("...got 4"). Fix: the registrar reads four fields
   and uses [0]=value, [1]=ts, [3]=alias (the duplicate [2] is ignored).

3. Union double free. set_union_finalfn pfree()s its state internally and
   returns a new Set; the SETFOLD finalize also called free(state) afterwards,
   crashing the worker with "double free or corruption". Fix: drop the extra
   free; only the finalfn result is freed.

All 14 pre-existing windowed-aggregate systests (extent/union/value-output
families across the tgeo and tnumber shapes) now pass against a local worker.
Three windowed value-output aggregates over a network-constrained mini-trip
grown on the in-process expandable Temporal* (appendInstant), each resolving
npoint route+fraction against the loaded ways network and emitting the result
as hex-WKB:

  - TNPOINT_CUMULATIVE_LENGTH_EXP  tnpoint_cumulative_length
  - TNPOINT_SPEED_EXP              tnpoint_speed
  - TNPOINT_TO_TGEOMPOINT_EXP      tnpoint_to_tgeompoint (network-resolved
                                   spatial trajectory)

The lift reuses the three-field tgeo glue (the three args are rid, frac, ts)
and builds each instant with tnpoint_in; codegen gains an expand_wkb_tnpoint
return mode (PHYSICAL_CPP_TNPOINT_EXPAND_WKB) that swaps the tgeo lift for the
tnpoint lift and adds the meos_npoint.h include. The operators link the same
libmeos as the rest of the engine, whose default ways CSV resolves the routes
at runtime; a systest per operator runs end-to-end against a local worker with
the network loaded, with the exact hex captured from a faithful MEOS probe.

The parity adapter counts a conversion helper (tnpoint_to_tgeompoint) as the
wired op when an operator is named for it (a dedicated conversion), keeping it
plumbing-excluded only when it co-occurs with another streamable call.

Feed: tnpoint_cumulative_length / tnpoint_speed / tnpoint_to_tgeompoint now
wired (327/1945).
A plain (non-windowed) SELECT projects its AS-aliased field unqualified, so the
sink must declare the computed field as `len`, not `tlw.len`; the qualified form
made the sink schema (TLW$LEN) diverge from the projection (LEN) and the query
failed to bind. Passthrough source fields stay qualified (tlw.id). Matches the
convention of the other per-event function systests (e.g. at_geometry).
…ven count

The proven (L3-callable) mapping relied on a parser-dispatch token map that only
captures per-event functions, so passing windowed-aggregate systests (…_EXP,
…_WKB, …_EXTENT/UNION, TNPOINT_…) did not register as callable. Resolve a
systest's SQL token to its operator by normalizing both to underscore-free
lowercase against the operator keys (dropping the Aggregation suffix) — the same
measured-not-guessed basis as the wired side, with no per-family pattern. With
the windowed-aggregate systests now passing end-to-end, proven rises from 6 to
38 distinct MEOS calls.
Reflect the measured state: 327 / 1,945 wired and 38 confirmed callable via
systests that run end-to-end on a local single-node worker. Describe the
value-output expand band (TRAJECTORY_WKB / TLENGTH_EXP / TEMPORAL_COPY_EXP /
TNUMBER_ABS_EXP) and the network-constrained tnpoint aggregates
(TNPOINT_CUMULATIVE_LENGTH_EXP / TNPOINT_SPEED_EXP / TNPOINT_TO_TGEOMPOINT_EXP,
resolved against the loaded ways network), and the normalized-name token
resolution behind the callable count.
…332)

Five windowed value-output aggregates that apply a single-argument MEOS temporal
transform to the per-group mini-trip grown on the in-process expandable
Temporal*, emitting the result as hex-WKB (the expand_wkb substrate):

  - TPOINT_CUMULATIVE_LENGTH_EXP  tpoint_cumulative_length  (tgeompoint -> tfloat)
  - TPOINT_SPEED_EXP              tpoint_speed              (tgeompoint -> tfloat)
  - TPOINT_GET_X_EXP              tpoint_get_x              (tgeompoint -> tfloat)
  - TPOINT_GET_Y_EXP              tpoint_get_y              (tgeompoint -> tfloat)
  - TNUMBER_TREND_EXP             tnumber_trend             (tnumber -> tint)

The four tgeompoint transforms reuse the tgeo (lon,lat,ts) lift and meos_geo.h;
TNUMBER_TREND_EXP reuses the tnumber (value,ts) lift. No new template or include.
A systest per operator runs end-to-end against a local worker, with the exact
hex captured from a faithful MEOS probe. All 44 MEOS systests pass.

Feed: 332 / 1,945 wired, 43 confirmed callable.
Add the single-argument temporal-transform value-output family
(TPOINT_CUMULATIVE_LENGTH_EXP / TPOINT_SPEED_EXP / TPOINT_GET_X_EXP /
TPOINT_GET_Y_EXP / TNUMBER_TREND_EXP) and update the measured counts.
…timezone)

MEOS is thread-safe via thread-local state: session_timezone, the timezone
cache, and the PROJ / ways / GSL caches are MEOS_TLS, so meos_initialize() sets
up the calling thread. The wrapper guarded initialization with a process-global
flag, so only the first thread was initialized; the engine runs operator
pipelines on a worker thread pool, leaving those threads with a NULL
session_timezone. The first text-timestamp serialization on such a thread
(tstzspan_out / stbox_out / tbox_out -> timestamp_out_common -> localsub) then
dereferenced the null timezone and segfaulted — nondeterministically, depending
on which worker thread ran the aggregate's lower(). The hex-WKB path encodes raw
microseconds and never reads the timezone, which is why only the text-output
extent/span/union aggregates crashed while the value-output ones were stable.

ensureMeosInitialized() now initializes per thread (thread_local guard); the
timezone-environment setup runs once via std::call_once and the per-thread
meos_initialize() calls are serialized so their process-global initializations
(PROJ / GEOS / error handler) do not race. The windowed-aggregate templates and
generated operators call ensureMeosInitialized() inside each runtime MEOS lambda
(the per-event lift already did; the fold/finalize lambdas now do too), so the
worker thread is always initialized before any MEOS call.

timestamptz_extent and tspatial_extent, previously ~1-in-5 segfaults, now pass
20/20 in isolation and the full 48-test MEOS suite passes twice with zero
failures.
Four windowed aggregates that reduce the per-group mini-trip (grown on the
in-process expandable Temporal*) to a geometry value, serialized as canonical
hex-EWKB via geo_out:

  - TGEO_START_VALUE_EXP   tgeo_start_value   (first point)
  - TGEO_END_VALUE_EXP     tgeo_end_value     (last point)
  - TGEO_CONVEX_HULL_EXP   tgeo_convex_hull   (trajectory convex hull)
  - TPOINT_TWCENTROID_EXP  tpoint_twcentroid  (time-weighted centroid)

codegen gains an expand_geo_wkb return mode (PHYSICAL_CPP_TGEO_EXPAND_GEO_WKB),
derived from the temporal value-output template by swapping the finalize for a
GSERIALIZED + geo_out (the lift/slot/append are unchanged). A systest per
operator runs end-to-end against a local worker with the exact hex from a
faithful MEOS probe; the full 48-test MEOS suite passes.

Feed: 336 / 1,945 wired, 47 confirmed callable.
Add the geometry value-output family (TGEO_START_VALUE_EXP / TGEO_END_VALUE_EXP /
TGEO_CONVEX_HULL_EXP / TPOINT_TWCENTROID_EXP) and update the measured counts.
PAIR_MEETING (geog_dwithin) and CROSS_DISTANCE (nad_tgeo_tgeo) realize the
cross-stream tier via a per-group state map + pairwise enumeration in lower();
the general shape is f(trajA, trajB) over two groups' windowed mini-trips (the
SNCB box-overlap alert = overlaps_stbox_stbox on each group's tspatial_extent).
Record the not-wired boundary: the binary_temporal family beyond those two — 58
functions in five output shapes (Temporal* 27, bool/int 21, nai 4, shortestline
4, nad 2). Also add the geometry value-output family to the status.
Observations group per vehicle (GROUP BY vehicle_id), so a cross-vehicle alert is
the per-vehicle windowed aggregate composed with a cross-vehicle comparison: each
vehicle's TSPATIAL_EXTENT box self-joined with the per-event predicate
overlaps_stbox_stbox. The overall (all-vehicles) view is a derivation over the
per-vehicle aggregates, not a separate aggregate; PAIR_MEETING / CROSS_DISTANCE
are the BerlinMOD-scaffold single-aggregate convenience. The cross-vehicle
comparison family (58 fns) is the f(A,B) predicates/transforms over two
per-vehicle aggregate outputs.
…336->365)

The production-faithful cross-stream realization: observations group per vehicle
(GROUP BY vehicle_id) into per-vehicle running extents (TSPATIAL_EXTENT), and the
cross-vehicle comparison is a per-event predicate over two extent boxes in a
self-join — not a single all-vehicles aggregate. The overall view is a derivation
over the per-vehicle aggregates.

Wire the 29 STBox-vs-STBox functions as per-event operators taking two STBox
VARSIZED inputs (each parsed via stbox_in, freed): the 22 topological/position
predicates (overlaps/contains/contained/left/right/above/below/front/back/
adjacent/same + over* + nad) and the 7 comparators (stbox_eq/ne/lt/le/gt/ge,
stbox_cmp). codegen_nebula gains a `stbox_text` generic input (STBox from a
VARSIZED text field) reusing the existing `box` extra-arg for the second box;
build_descriptor gains the `stbox_x_stbox` classifier. `overlaps_stbox_stbox`
over two TSPATIAL_EXTENT boxes is the SNCB box-overlap alert.

Systests overlaps_stbox_stbox + left_stbox_stbox run end-to-end on a local worker;
the full 50-test MEOS suite passes. Feed: 365 / 1,945 wired, 49 confirmed callable.
…e predicates

The 29 STBox-vs-STBox cross-vehicle functions are wired as per-event operators
(two STBox VARSIZED inputs); overlaps_stbox_stbox over two TSPATIAL_EXTENT boxes
is the SNCB box-overlap alert. The remaining cross-vehicle gap is the
f(trajA,trajB) temporal family over two per-vehicle temporal outputs.
…PagedVector

The extent/fold/union aggregates buffered every raw event in a PagedVector and
folded once in lower() (O(N) state) — and TSPATIAL_EXTENT additionally rebuilt a
"{p1, p2, ...}" trajectory string and re-parsed it. Convert it to the same
incremental MEOS-accumulator-slot model the value-output operators already use:
an STBox* slot in the AggregationState, folded per event in lift() via
tspatial_extent_transfn (O(1) state, no event buffer, no string round-trip).
combine() merges two partial boxes via union_stbox_stbox (stbox_copy for a null
slot); lower() serializes stbox_out and frees the slot; reset()/cleanup() null
and free it; getSizeOfStateInBytes() is sizeof(STBox*).

This is the per-vehicle box that the cross-vehicle overlaps_stbox_stbox predicate
consumes — now built with the streaming-optimal accumulator. Output is identical:
tspatial_extent passes byte-for-byte and the full 50-test MEOS suite is green.
…gedVector

Same conversion as TSPATIAL_EXTENT, for the value/time box: a TBox* slot in the
AggregationState folded per event via tnumber_extent_transfn (tfloat_in instant),
combine merging via union_tbox_tbox (tbox_copy for a null slot), lower serializing
tbox_out and freeing the slot. O(1) state, no event buffer, no "{v1@t1,...}"
string round-trip. Completes the box-extent family's uniformization to the
incremental MEOS-accumulator slot. tnumber_extent passes byte-for-byte; the full
50-test MEOS suite is green.
…ot a PagedVector

FLOAT/INT/BIGINT/TIMESTAMPTZ_EXTENT converted from PagedVector-buffer-all to the
uniform incremental MEOS-accumulator slot: a Span* in the AggregationState folded
per event in lift() via the typed *_extent_transfn (tfloat_in/value cast;
TIMESTAMPTZ converts the epoch to TimestampTz), combine() merging two partial
spans into their bounding span via super_union_span_span(s1, s2, false), lower()
serializing the typed *span_out and freeing the slot. O(1) state, no event buffer.

Uses the newly-public super_union_span_span (MobilityDB d35ac2a0a — the span
analog of union_stbox_stbox/union_tbox_tbox) rather than a spanset_span(
union_span_span(...)) workaround, per the proper-MEOS-export principle. Built
against the dev image carrying that public declaration; all four systests pass
byte-for-byte and the full 50-test MEOS suite is green.
…a PagedVector

FLOAT/INT/BIGINT/TIMESTAMPTZ_UNION converted to the uniform incremental
MEOS-accumulator slot: a Set* in the AggregationState folded per event in lift()
via the typed *_union_transfn, combine() merging two partial sets via
set_union_transfn (creates from the other when one is null), lower() applying
set_union_finalfn (which consumes the slot) and serializing the deduplicated
sorted set via the typed *set_out. O(1)-amortized state, no event buffer.

Completes the extent+union family's uniformization (box STBox/TBox, span Span,
set Set — 10 operators) to the incremental slot the value-output operators
already use. All four systests pass byte-for-byte; the full 50-test MEOS suite
is green.
Describe the uniform construction: every single-accumulator windowed aggregate
folds one MEOS accumulator (Temporal* / STBox* / TBox* / Span* / Set*) per event
in lift() with O(1) state, merged in combine() (union_stbox_stbox /
union_tbox_tbox / super_union_span_span / set_union_transfn), serialized in
lower() — replacing the per-family description of folding a buffered window.
… two hex-WKB inputs)

Wire the 13 tnumber-vs-tnumber position/topological predicates as per-event
operators taking two temporal operands carried as hex-WKB VARSIZED fields (each
temporal_from_hexwkb-parsed, freed): adjacent/after/before/contained/contains/
left/right/overafter/overbefore/overlaps/overleft/overright/same_tnumber_tnumber
-> bool. These compose two per-vehicle tnumber aggregate outputs in a self-join —
the value/time counterpart of the STBox cross-vehicle predicates.

codegen_nebula gains a `wkb_temporal` extra-arg kind (a second hex-WKB temporal
operand); build_descriptor gains the `two_temporal_scalar` classifier. Systests
overlaps_tnumber_tnumber + left_tnumber_tnumber run end-to-end on a local worker;
the full 52-test MEOS suite passes. Wired surface now 384 / 1,945, 57 callable
(the +6 over the 13 added here is the merge functions the uniformized extent/union
combine() now invokes — union_stbox_stbox / union_tbox_tbox / super_union_span_span
/ set_union_transfn — all streamable).
…cle predicates

The 13 tnumber-vs-tnumber position predicates are wired as per-event operators
over two hex-WKB temporal operands (two_temporal_scalar classifier + wkb_temporal
extra-arg) — the value/time counterpart of the STBox cross-vehicle predicates.
Counts updated to the measured 384 wired / 57 callable (incl. the merge functions
the uniformized extent/union combine() invokes). Remaining cross-vehicle gap = the
Temporal*-returning f(trajA,trajB) family (tdistance/tcontains/nai/shortestline).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant